From the three data sets, “Travel Expenses issued by Minister officials,” is interesting for me to do the exploratory data analysis. The use of information in the data set has Open Government License – Canada. This dataset consolidates all the travel expense reports submitted by federal institutions.
I would like to analyse the impact of the travel expenses for the total cost, organisational impact for the travel, and the organisations which has highest travel expenses. The reason why I would like to base my analysis in expenses is to see which category of expenses such as airfare, other transport, lodging, or meals which would affect the total cost of expenses. Also, I would like to see any travel patterns throughout the years to find out the reasoning for the fluctuations, whether it is due to number of organisations that has increased yearly or if there are particular organisations directly impacted by the travel expenses.
Proactive Disclosure of Travel expenses has two csv data sets,
I will be considering only the proactive disclosure - travel expenses dataset. There is a data dictionary-Travel which talks about how data has been entered, optional columns, mandatory columns, and definitions of the column themes.
Proactive Disclosure - travel Expenses dataset consists of 21 columns, 12 columns that are mandatory and 9 that are optional. The data has been collected from 2003 December to 2021 March. Ten columns such as Title in English, Travel start date, Airfare, Other transportation, Lodging, Meals, Other Expenses, Total and owner organisation would interest me to achieve my goals. The tolal 11 columns are not considered. Among those, most columns are in French and the end date, person name, destination, and travel description columns are not required in my analysis.
import altair as alt
import pandas as pd
import numpy as np
tr_ex = pd.read_csv("../travel_expenses.csv",
parse_dates=["start_date"],
usecols=[
"title_en",
"start_date",
"destination_en",
"airfare",
"other_transport",
"lodging",
"meals",
"other_expenses",
"total",
"owner_org",
],
)
tr_ex
| title_en | start_date | destination_en | airfare | other_transport | lodging | meals | other_expenses | total | owner_org | |
|---|---|---|---|---|---|---|---|---|---|---|
| 0 | Assistant Deputy Minister, Market and Services... | 2017-02-24 | Hanoi, VN & New Delhi, IN | 13555.46 | 150.00 | 3451.37 | 1946.63 | 0.0 | 19103.46 | aafc-aac |
| 1 | Assistant Deputy Minister, Market and Services... | 2017-05-11 | Winnipeg, Man. | 1090.03 | 72.64 | 147.81 | 161.70 | 0.0 | 1472.18 | aafc-aac |
| 2 | Assistant Deputy Minister, Market and Services... | 2017-05-20 | Charlottetown, P.E.I. | 426.29 | 320.59 | 200.18 | 97.70 | 0.0 | 1044.76 | aafc-aac |
| 3 | Assistant Deputy Minister, Science and Technol... | 2017-03-21 | Winnipeg, MB | 670.69 | 136.27 | 461.19 | 342.85 | NaN | 1611.00 | aafc-aac |
| 4 | Assistant Deputy Minister, Science and Technol... | 2017-04-11 | London, ON | 507.55 | 83.18 | 157.07 | 115.75 | NaN | 863.55 | aafc-aac |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 70344 | Chair | 2020-06-25 | Whitehorse | NaN | 242.44 | 0.00 | 105.35 | NaN | 347.79 | yesab-oeesy |
| 70345 | Chair | 2020-07-30 | Whitehorse | NaN | 242.44 | 0.00 | 105.35 | NaN | 347.79 | yesab-oeesy |
| 70346 | Chair | 2020-09-23 | Haines Junction, Rainbow Lake, Quill Creek | 3381.00 | 203.00 | 0.00 | 21.20 | NaN | 224.20 | yesab-oeesy |
| 70347 | Executive Committee Member | 2020-09-23 | Haines Junction, Rainbow Lake, Quill Creek | 3381.00 | 203.00 | 0.00 | 21.20 | NaN | 224.20 | yesab-oeesy |
| 70348 | Executive Committee Member | 2020-09-23 | Haines Junction, Rainbow Lake, Quill Creek | 3381.00 | 0.00 | 0.00 | 21.20 | NaN | 3402.20 | yesab-oeesy |
70349 rows × 10 columns
tr_ex.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 70349 entries, 0 to 70348 Data columns (total 10 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 title_en 69641 non-null object 1 start_date 70349 non-null datetime64[ns] 2 destination_en 70017 non-null object 3 airfare 67151 non-null float64 4 other_transport 68562 non-null float64 5 lodging 67889 non-null float64 6 meals 69149 non-null float64 7 other_expenses 63290 non-null float64 8 total 70160 non-null float64 9 owner_org 70349 non-null object dtypes: datetime64[ns](1), float64(6), object(3) memory usage: 5.4+ MB
tr_ex_dnull = tr_ex.fillna(value=0)
tr_ex_dnull
| title_en | start_date | destination_en | airfare | other_transport | lodging | meals | other_expenses | total | owner_org | |
|---|---|---|---|---|---|---|---|---|---|---|
| 0 | Assistant Deputy Minister, Market and Services... | 2017-02-24 | Hanoi, VN & New Delhi, IN | 13555.46 | 150.00 | 3451.37 | 1946.63 | 0.0 | 19103.46 | aafc-aac |
| 1 | Assistant Deputy Minister, Market and Services... | 2017-05-11 | Winnipeg, Man. | 1090.03 | 72.64 | 147.81 | 161.70 | 0.0 | 1472.18 | aafc-aac |
| 2 | Assistant Deputy Minister, Market and Services... | 2017-05-20 | Charlottetown, P.E.I. | 426.29 | 320.59 | 200.18 | 97.70 | 0.0 | 1044.76 | aafc-aac |
| 3 | Assistant Deputy Minister, Science and Technol... | 2017-03-21 | Winnipeg, MB | 670.69 | 136.27 | 461.19 | 342.85 | 0.0 | 1611.00 | aafc-aac |
| 4 | Assistant Deputy Minister, Science and Technol... | 2017-04-11 | London, ON | 507.55 | 83.18 | 157.07 | 115.75 | 0.0 | 863.55 | aafc-aac |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 70344 | Chair | 2020-06-25 | Whitehorse | 0.00 | 242.44 | 0.00 | 105.35 | 0.0 | 347.79 | yesab-oeesy |
| 70345 | Chair | 2020-07-30 | Whitehorse | 0.00 | 242.44 | 0.00 | 105.35 | 0.0 | 347.79 | yesab-oeesy |
| 70346 | Chair | 2020-09-23 | Haines Junction, Rainbow Lake, Quill Creek | 3381.00 | 203.00 | 0.00 | 21.20 | 0.0 | 224.20 | yesab-oeesy |
| 70347 | Executive Committee Member | 2020-09-23 | Haines Junction, Rainbow Lake, Quill Creek | 3381.00 | 203.00 | 0.00 | 21.20 | 0.0 | 224.20 | yesab-oeesy |
| 70348 | Executive Committee Member | 2020-09-23 | Haines Junction, Rainbow Lake, Quill Creek | 3381.00 | 0.00 | 0.00 | 21.20 | 0.0 | 3402.20 | yesab-oeesy |
70349 rows × 10 columns
tr_ex_dnull.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 70349 entries, 0 to 70348 Data columns (total 10 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 title_en 70349 non-null object 1 start_date 70349 non-null datetime64[ns] 2 destination_en 70349 non-null object 3 airfare 70349 non-null float64 4 other_transport 70349 non-null float64 5 lodging 70349 non-null float64 6 meals 70349 non-null float64 7 other_expenses 70349 non-null float64 8 total 70349 non-null float64 9 owner_org 70349 non-null object dtypes: datetime64[ns](1), float64(6), object(3) memory usage: 5.4+ MB
Above infomation verifies null values have been replaced by zeros.
As in the EDA there are values in the bdataframe entered as negative values.Therefore assumed those values have been entered incorrectly and the whole dataframe going to be filtered for the less than zero values
tr_ex_neg = tr_ex_dnull[
(tr_ex_dnull["airfare"] < 0)
| (tr_ex_dnull["other_transport"] < 0)
| (tr_ex_dnull["other_expenses"] < 0)
| (tr_ex_dnull["total"] < 0)
]
tr_ex_neg.index.tolist()
[4997, 9839, 13026, 22733, 22736, 22774, 22922, 22923, 40716, 52043, 52423, 53490, 60931, 60934, 64685]
tr_ex_rem = tr_ex_dnull.drop(
index=[
4997,
9839,
13026,
22733,
22736,
22774,
22922,
22923,
40716,
52043,
52423,
53490,
60931,
60934,
64685,
]
)
tr_ex_rem
| title_en | start_date | destination_en | airfare | other_transport | lodging | meals | other_expenses | total | owner_org | |
|---|---|---|---|---|---|---|---|---|---|---|
| 0 | Assistant Deputy Minister, Market and Services... | 2017-02-24 | Hanoi, VN & New Delhi, IN | 13555.46 | 150.00 | 3451.37 | 1946.63 | 0.0 | 19103.46 | aafc-aac |
| 1 | Assistant Deputy Minister, Market and Services... | 2017-05-11 | Winnipeg, Man. | 1090.03 | 72.64 | 147.81 | 161.70 | 0.0 | 1472.18 | aafc-aac |
| 2 | Assistant Deputy Minister, Market and Services... | 2017-05-20 | Charlottetown, P.E.I. | 426.29 | 320.59 | 200.18 | 97.70 | 0.0 | 1044.76 | aafc-aac |
| 3 | Assistant Deputy Minister, Science and Technol... | 2017-03-21 | Winnipeg, MB | 670.69 | 136.27 | 461.19 | 342.85 | 0.0 | 1611.00 | aafc-aac |
| 4 | Assistant Deputy Minister, Science and Technol... | 2017-04-11 | London, ON | 507.55 | 83.18 | 157.07 | 115.75 | 0.0 | 863.55 | aafc-aac |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 70344 | Chair | 2020-06-25 | Whitehorse | 0.00 | 242.44 | 0.00 | 105.35 | 0.0 | 347.79 | yesab-oeesy |
| 70345 | Chair | 2020-07-30 | Whitehorse | 0.00 | 242.44 | 0.00 | 105.35 | 0.0 | 347.79 | yesab-oeesy |
| 70346 | Chair | 2020-09-23 | Haines Junction, Rainbow Lake, Quill Creek | 3381.00 | 203.00 | 0.00 | 21.20 | 0.0 | 224.20 | yesab-oeesy |
| 70347 | Executive Committee Member | 2020-09-23 | Haines Junction, Rainbow Lake, Quill Creek | 3381.00 | 203.00 | 0.00 | 21.20 | 0.0 | 224.20 | yesab-oeesy |
| 70348 | Executive Committee Member | 2020-09-23 | Haines Junction, Rainbow Lake, Quill Creek | 3381.00 | 0.00 | 0.00 | 21.20 | 0.0 | 3402.20 | yesab-oeesy |
70334 rows × 10 columns
tr_ex_rem.describe()
| airfare | other_transport | lodging | meals | other_expenses | total | |
|---|---|---|---|---|---|---|
| count | 7.033400e+04 | 7.033400e+04 | 7.033400e+04 | 70334.000000 | 70334.000000 | 70334.000000 |
| mean | 2.606380e+13 | 1.651682e+02 | 3.929304e+02 | 195.355313 | 21.235897 | 1756.637541 |
| std | 6.912264e+15 | 7.264727e+03 | 1.218554e+04 | 3271.434343 | 116.634977 | 3266.412409 |
| min | 0.000000e+00 | 0.000000e+00 | 0.000000e+00 | 0.000000 | 0.000000 | 0.000000 |
| 25% | 0.000000e+00 | 2.500000e+01 | 0.000000e+00 | 38.000000 | 0.000000 | 479.577500 |
| 50% | 6.116900e+02 | 9.300000e+01 | 1.887200e+02 | 124.700000 | 0.000000 | 1089.600000 |
| 75% | 1.165310e+03 | 1.932000e+02 | 4.508325e+02 | 236.150000 | 12.000000 | 2027.325000 |
| max | 1.833171e+18 | 1.926107e+06 | 3.228626e+06 | 865324.600000 | 11366.910000 | 315315.000000 |
According to the Exploratory data analysis, it shows from 2017 to 2019 huge increase of travel. Therefore the data analysis going to be based on the 2016 to 2019 to find the reasons for the sudden increase.
First considering the data from year 2019, the highest travel expenses egsists among all the years form 2003 to 2021
# new_colyear = tr_ex_rem['start_date'].dt.year
new_colyear = tr_ex_rem.assign(year=tr_ex_rem["start_date"].dt.year)
new_colyear
| title_en | start_date | destination_en | airfare | other_transport | lodging | meals | other_expenses | total | owner_org | year | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | Assistant Deputy Minister, Market and Services... | 2017-02-24 | Hanoi, VN & New Delhi, IN | 13555.46 | 150.00 | 3451.37 | 1946.63 | 0.0 | 19103.46 | aafc-aac | 2017 |
| 1 | Assistant Deputy Minister, Market and Services... | 2017-05-11 | Winnipeg, Man. | 1090.03 | 72.64 | 147.81 | 161.70 | 0.0 | 1472.18 | aafc-aac | 2017 |
| 2 | Assistant Deputy Minister, Market and Services... | 2017-05-20 | Charlottetown, P.E.I. | 426.29 | 320.59 | 200.18 | 97.70 | 0.0 | 1044.76 | aafc-aac | 2017 |
| 3 | Assistant Deputy Minister, Science and Technol... | 2017-03-21 | Winnipeg, MB | 670.69 | 136.27 | 461.19 | 342.85 | 0.0 | 1611.00 | aafc-aac | 2017 |
| 4 | Assistant Deputy Minister, Science and Technol... | 2017-04-11 | London, ON | 507.55 | 83.18 | 157.07 | 115.75 | 0.0 | 863.55 | aafc-aac | 2017 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 70344 | Chair | 2020-06-25 | Whitehorse | 0.00 | 242.44 | 0.00 | 105.35 | 0.0 | 347.79 | yesab-oeesy | 2020 |
| 70345 | Chair | 2020-07-30 | Whitehorse | 0.00 | 242.44 | 0.00 | 105.35 | 0.0 | 347.79 | yesab-oeesy | 2020 |
| 70346 | Chair | 2020-09-23 | Haines Junction, Rainbow Lake, Quill Creek | 3381.00 | 203.00 | 0.00 | 21.20 | 0.0 | 224.20 | yesab-oeesy | 2020 |
| 70347 | Executive Committee Member | 2020-09-23 | Haines Junction, Rainbow Lake, Quill Creek | 3381.00 | 203.00 | 0.00 | 21.20 | 0.0 | 224.20 | yesab-oeesy | 2020 |
| 70348 | Executive Committee Member | 2020-09-23 | Haines Junction, Rainbow Lake, Quill Creek | 3381.00 | 0.00 | 0.00 | 21.20 | 0.0 | 3402.20 | yesab-oeesy | 2020 |
70334 rows × 11 columns
year2019_f = new_colyear[new_colyear["year"] == 2019]
year2019_f
| title_en | start_date | destination_en | airfare | other_transport | lodging | meals | other_expenses | total | owner_org | year | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 287 | Driver | 2019-12-01 | Ottawa (Ontario) Canada | 0.00 | 6.20 | 140.17 | 105.50 | 0.00 | 251.87 | aafc-aac | 2019 |
| 411 | Deputy Minister | 2019-06-10 | Montreal (Quebec) Canada | 0.00 | 193.23 | 232.05 | 50.65 | 0.00 | 475.93 | aafc-aac | 2019 |
| 429 | Senior Communications and Policy Advisor | 2019-02-12 | Burlington, Ontario | 0.00 | 0.00 | 145.72 | 0.00 | 0.00 | 145.72 | aafc-aac | 2019 |
| 436 | Chief of Staff | 2019-02-19 | Washington, District of Columbia | 1201.63 | 6.32 | 578.70 | 346.79 | 0.00 | 2133.44 | aafc-aac | 2019 |
| 444 | Associate Deputy Minister | 2019-01-14 | Montreal, Quebec | 0.00 | 65.30 | 0.00 | 0.00 | 0.00 | 65.30 | aafc-aac | 2019 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 70298 | Assistant Deputy Minister (Alberta) | 2019-12-16 | Calgary, AB | 0.00 | 136.04 | 223.42 | 75.55 | 12.43 | 447.44 | wd-deo | 2019 |
| 70299 | Assistant Deputy Minister (British Columbia) | 2019-12-16 | Edmonton, AB | 92.40 | 180.57 | 357.02 | 194.20 | 12.32 | 836.51 | wd-deo | 2019 |
| 70300 | Assistant Deputy Minister, Policy and Strategi... | 2019-12-16 | Edmonton, AB | 1055.51 | 137.26 | 786.80 | 371.10 | 12.43 | 2363.10 | wd-deo | 2019 |
| 70305 | Assistant Deputy Minister (Saskatchewan) | 2019-11-14 | Saskatoon, Saskatchewan | 0.00 | 114.50 | 0.00 | 0.00 | 7.33 | 121.83 | wd-deo | 2019 |
| 70316 | Assistant Deputy Minister (British Columbia) | 2019-12-09 | Vancouver, British Columbia | 0.00 | 28.35 | 0.00 | 0.00 | 0.00 | 28.35 | wd-deo | 2019 |
10957 rows × 11 columns
To get an overall pitcture of the owner organisation travel within the year 2019, point map is used to plot total of expenses verses start date
alt.data_transformers.disable_max_rows()
year2019_tr = (
alt.Chart(year2019_f)
.mark_point(size=8)
.encode(
x=alt.X("start_date", title="Travel Dates figure 1"),
y=alt.Y("total", title="Total travel expenses"),
)
.properties(title="Travel vs Expenses 2019")
)
year2019_tr
Most of the travels associated with less than $5000 in total. Lets filter the data above 4000 in total
high_total_f = year2019_f[year2019_f["total"] > 4000]
high_total_f
| title_en | start_date | destination_en | airfare | other_transport | lodging | meals | other_expenses | total | owner_org | year | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 540 | Deputy Minister | 2019-02-24 | Hong Kong; Ho Chi Minh City, Vietnam; Hanoi, V... | 5145.12 | 0.00 | 1756.43 | 105.35 | 0.00 | 7006.90 | aafc-aac | 2019 |
| 542 | Associate Deputy Minister | 2019-02-19 | Washington, District of Columbia | 3807.96 | 116.92 | 738.00 | 357.82 | 0.00 | 5020.70 | aafc-aac | 2019 |
| 560 | Assistant Deputy Minister, International Affai... | 2019-01-16 | Berlin, Germany; Ankara, Turkey | 7671.18 | 142.86 | 1005.86 | 896.64 | 0.00 | 9716.54 | aafc-aac | 2019 |
| 588 | Associate Deputy Minister | 2019-05-09 | Niigata, Japan; Tokyo, Japan | 5333.48 | 41.20 | 918.83 | 386.28 | 0.00 | 6679.79 | aafc-aac | 2019 |
| 597 | Assistant deputy minister, Science and Technol... | 2019-04-22 | Tokyo, Japan | 6195.10 | 213.64 | 628.81 | 253.60 | 0.00 | 7291.15 | aafc-aac | 2019 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 70245 | Deputy Minister | 2019-12-16 | Washington, DC | 3402.12 | 355.87 | 1083.53 | 499.80 | 12.43 | 5353.75 | wd-deo | 2019 |
| 70254 | Deputy Minister | 2019-08-19 | Calgary, AB; Ottawa, ON | 3003.42 | 265.93 | 875.66 | 522.25 | 41.41 | 4708.67 | wd-deo | 2019 |
| 70268 | Deputy Minister | 2019-09-12 | Palo Alto, CA; Ottawa, ON | 4043.17 | 734.38 | 5281.20 | 657.35 | 35.01 | 10751.11 | wd-deo | 2019 |
| 70272 | Deputy Minister | 2019-09-30 | Ottawa, ON; Seattle, WA | 4699.16 | 314.87 | 1356.02 | 532.85 | 12.43 | 6915.33 | wd-deo | 2019 |
| 70286 | Deputy Minister | 2019-11-18 | Ottawa, ON | 2661.75 | 204.82 | 935.44 | 473.10 | 57.59 | 4332.70 | wd-deo | 2019 |
1023 rows × 11 columns
high_total_sc = None
brush = alt.selection_interval()
high_total_sc = (
alt.Chart(high_total_f)
.mark_point(size=8)
.encode(
x=alt.X("start_date", title="Travel Dates"),
y=alt.Y("total", title="Total travel expenses", scale=alt.Scale(zero=False)),
color=alt.value("green"),
)
.properties(title="Expenses above $4000")
)
high_total_sc
To get the total expense for travel per organisation in year 2019,data is grouped by owner organisation
ow_org_gr = None
ow_org_gr = year2019_f.groupby(by="owner_org", as_index=False).agg(
{
"airfare": ["sum"],
"other_transport": ["sum"],
"lodging": ["sum"],
"meals": ["sum"],
"other_expenses": ["sum"],
"total": ["sum"],
}
)
ow_org_gr.columns = ["_".join(col).rstrip("_") for col in ow_org_gr.columns.values]
ow_org_gr
| owner_org | airfare_sum | other_transport_sum | lodging_sum | meals_sum | other_expenses_sum | total_sum | |
|---|---|---|---|---|---|---|---|
| 0 | aafc-aac | 254660.29 | 36954.97 | 106186.01 | 64773.41 | 8771.87 | 471366.30 |
| 1 | aandc-aadnc | 395150.83 | 26899.24 | 109406.59 | 56433.11 | 29149.33 | 618248.95 |
| 2 | acoa-apeca | 131959.05 | 27999.23 | 94718.43 | 53328.61 | 3007.37 | 311012.69 |
| 3 | aecl-eacl | 81855.92 | 23286.23 | 73138.53 | 34080.03 | 7085.52 | 219497.64 |
| 4 | atssc-scdata | 73250.83 | 23686.01 | 53509.71 | 30540.53 | 5765.31 | 186752.39 |
| ... | ... | ... | ... | ... | ... | ... | ... |
| 90 | tsb-bst | 14247.12 | 9549.77 | 17471.03 | 9934.10 | 4346.37 | 55548.39 |
| 91 | vac-acc | 316353.77 | 35399.96 | 139894.60 | 71829.15 | 3373.65 | 564963.61 |
| 92 | vrab-tacra | 44641.19 | 7935.80 | 25730.68 | 15529.32 | 1393.74 | 88896.14 |
| 93 | wage | 115968.87 | 20922.88 | 55760.38 | 23948.68 | 2453.48 | 218635.88 |
| 94 | wd-deo | 90593.10 | 24697.75 | 51309.99 | 23186.58 | 1620.46 | 191407.69 |
95 rows × 7 columns
ow_org_ex = None
ow_org_ex = (
alt.Chart(ow_org_gr)
.mark_bar()
.encode(
x=alt.X("owner_org:N", title="Owner Organisation", sort="y"),
y=alt.Y("total_sum:Q", title="Total Expenses"),
tooltip="total_sum",
)
.properties(title="Veriation of Total Expenses", width=800)
)
ow_org_ex
ow_org_ex1 = None
high_total_sc = None
click = alt.selection_multi()
ow_org_ex1 = (
alt.Chart(ow_org_gr)
.mark_bar()
.encode(
x=alt.X("owner_org:N", title="Owner_Organisation", sort="y"),
y=alt.Y("total_sum:Q", title="Total Expenses"),
opacity=alt.condition(click, alt.value(0.9), alt.value(0.2)),
)
.add_selection(click)
.properties(width=900)
)
# ow_org_ex1
# brush alt.selection_interval()
high_total_sc = (
alt.Chart(high_total_f)
.mark_point(size=8)
.encode(
x=alt.X("start_date", title="Travel Dates"),
y=alt.Y("total", title="Total travel expenses", scale=alt.Scale(zero=False)),
color=alt.value("green"),
)
.add_selection(brush)
.properties()
)
# high_total_sc
# ow_org_ex1 | high_total_sc
click = alt.selection_multi(fields=["owner_org"])
ow_org_ex1 = ow_org_ex1.add_selection(click)
high_total_sc = high_total_sc.encode(
opacity=alt.condition(click, alt.value(0.9), alt.value(0.2))
)
(ow_org_ex1 | high_total_sc).properties(title= 'Comparison of expensive travels among owner organisations')
# click_travel = alt.selection_multi(fields=['owner_org'], bind='legend')
# high_total_sc = high_total_sc.encode(opacity=alt.condition(click_travel, alt.value(0.9), alt.value(0.2)))
# high_total_sc.add_selection(click_travel)
# ow_org_ex1 = ow_org_ex1.encode(opacity=alt.condition(click_travel, alt.value(0.9), alt.value(0.2)))
# (ow_org_ex1 | high_total_sc).add_selection(click_travel)
To find the answer to the following question, lets analyse the travel counts of top ten owner organisations
ow_org_top10 = None
ow_org_top10 = ow_org_gr.nlargest(10, "total_sum")
ow_org_top10
| owner_org | airfare_sum | other_transport_sum | lodging_sum | meals_sum | other_expenses_sum | total_sum | |
|---|---|---|---|---|---|---|---|
| 28 | dnd-mdn | 1955101.53 | 86756.76 | 419079.41 | 254313.98 | 30620.59 | 2677550.66 |
| 41 | ic | 829169.42 | 103160.14 | 243565.94 | 134571.74 | 3656.66 | 1314123.90 |
| 32 | esdc-edsc | 468771.18 | 101136.29 | 181070.63 | 112334.44 | 3016.79 | 866329.33 |
| 35 | fin | 598247.03 | 44878.01 | 130398.30 | 60418.91 | 6355.74 | 840481.19 |
| 29 | ec | 488215.71 | 32344.89 | 151837.52 | 74983.07 | 3320.72 | 750701.86 |
| 20 | cra-arc | 331276.48 | 77056.49 | 173763.01 | 96630.97 | 833.07 | 679560.02 |
| 55 | nrcan-rncan | 427077.24 | 31368.25 | 125421.17 | 59349.14 | 2327.19 | 645542.99 |
| 1 | aandc-aadnc | 395150.83 | 26899.24 | 109406.59 | 56433.11 | 29149.33 | 618248.95 |
| 27 | dfo-mpo | 356225.78 | 36372.89 | 110947.04 | 63464.65 | 4296.49 | 571306.85 |
| 91 | vac-acc | 316353.77 | 35399.96 | 139894.60 | 71829.15 | 3373.65 | 564963.61 |
Filtering associated data for top ten owner organisations
top10_data = None
top10_data = year2019_f[year2019_f['owner_org'].isin(['dnd-mdn', 'ic', 'esdc-edsc','fin', 'ec','cra-arc', 'nrcan-rncan', 'aandc-aadnc', 'dfo-mpo', 'vac-acc' ])]
top10_data
| title_en | start_date | destination_en | airfare | other_transport | lodging | meals | other_expenses | total | owner_org | year | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 998 | A/Assistant Deputy Minister | 2019-02-22 | Nanaimo, British Columbia | 1153.90 | 0.00 | 0.00 | 0.00 | 46.00 | 1199.90 | aandc-aadnc | 2019 |
| 1007 | Senior Assistant Deputy Minister | 2019-12-01 | Vancouver | 3048.25 | 104.79 | 991.38 | 507.62 | 57.00 | 4863.11 | aandc-aadnc | 2019 |
| 1025 | Minister, Crown-Indigenous Relations | 2019-03-25 | Yellowknife, Northwest Territories, Canada | 2909.35 | 83.60 | 670.60 | 354.60 | 180.60 | 4198.75 | aandc-aadnc | 2019 |
| 1042 | Minister, Crown-Indigenous Relations | 2019-03-06 | Iqaluit, Nunavut, Canada | 552.37 | 90.00 | 472.50 | 26.95 | 135.45 | 1277.27 | aandc-aadnc | 2019 |
| 1044 | Minister, Crown-Indigenous Relations | 2019-03-13 | North Bay, Ontario, Canada | 1057.96 | 0.00 | 0.00 | 70.25 | 90.30 | 1218.51 | aandc-aadnc | 2019 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 68502 | DEPUTY MINISTER | 2019-10-09 | CHARLOTTETOWN, PE | 1104.61 | 0.00 | 0.00 | 0.00 | 0.00 | 1104.61 | vac-acc | 2019 |
| 68503 | DEPUTY MINISTER | 2019-11-13 | CHARLOTTETOWN, PE | 726.86 | 0.00 | 0.00 | 0.00 | 0.00 | 726.86 | vac-acc | 2019 |
| 68505 | Minister of Veterans Affairs | 2019-05-03 | Ottawa- Halifax-Ottawa | 1128.27 | 0.00 | 0.00 | 182.50 | 0.00 | 1310.77 | vac-acc | 2019 |
| 68506 | Minister of Veterans Affairs | 2019-06-14 | Toronto, ON | 1104.29 | 0.00 | 0.00 | 70.75 | 0.00 | 1175.04 | vac-acc | 2019 |
| 68507 | Minister of Veterans Affairs | 2019-05-30 | France & Rome | 8008.22 | 0.00 | 0.00 | 644.06 | 0.00 | 8652.28 | vac-acc | 2019 |
4354 rows × 11 columns
top10_data_trc = alt.Chart(top10_data).mark_bar().encode(x=alt.X('count()', title="count"),
y=alt.Y('owner_org:N', title="Owner Organisation", sort='x')).properties(title="Number of travels among top 10 organisations")
top10_data_trc
When comparing travel counts ic and dnd-mdn has considerable high counts than the rest of the organisations. Also there is no significant difference of travel counts among ic and dnd-mdn. But considering total expenses dnd_mdn spent more than 50% of ic.
Lets introduce new column as level of expenses, the total of expenses equal or above refered to as 'high' and the total of expenses below as 'low'
top10_new_col = None
top10_new_col = year2019_f[year2019_f['owner_org'].isin(['dnd-mdn', 'ic', 'esdc-edsc','fin', 'ec','cra-arc', 'nrcan-rncan', 'aandc-aadnc', 'dfo-mpo', 'vac-acc' ])]
top10_new_col.loc[top10_data['total'] >= 3000, 'lev_expenses'] = 'high'
top10_new_col.loc[top10_data['total'] < 3000, 'lev_expenses'] = 'low'
top10_new_col
C:\Users\pamod\anaconda3\lib\site-packages\pandas\core\indexing.py:1596: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy self.obj[key] = _infer_fill_value(value) C:\Users\pamod\anaconda3\lib\site-packages\pandas\core\indexing.py:1765: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy isetter(loc, value) C:\Users\pamod\anaconda3\lib\site-packages\pandas\core\indexing.py:1765: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy isetter(loc, value)
| title_en | start_date | destination_en | airfare | other_transport | lodging | meals | other_expenses | total | owner_org | year | lev_expenses | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 998 | A/Assistant Deputy Minister | 2019-02-22 | Nanaimo, British Columbia | 1153.90 | 0.00 | 0.00 | 0.00 | 46.00 | 1199.90 | aandc-aadnc | 2019 | low |
| 1007 | Senior Assistant Deputy Minister | 2019-12-01 | Vancouver | 3048.25 | 104.79 | 991.38 | 507.62 | 57.00 | 4863.11 | aandc-aadnc | 2019 | high |
| 1025 | Minister, Crown-Indigenous Relations | 2019-03-25 | Yellowknife, Northwest Territories, Canada | 2909.35 | 83.60 | 670.60 | 354.60 | 180.60 | 4198.75 | aandc-aadnc | 2019 | high |
| 1042 | Minister, Crown-Indigenous Relations | 2019-03-06 | Iqaluit, Nunavut, Canada | 552.37 | 90.00 | 472.50 | 26.95 | 135.45 | 1277.27 | aandc-aadnc | 2019 | low |
| 1044 | Minister, Crown-Indigenous Relations | 2019-03-13 | North Bay, Ontario, Canada | 1057.96 | 0.00 | 0.00 | 70.25 | 90.30 | 1218.51 | aandc-aadnc | 2019 | low |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 68502 | DEPUTY MINISTER | 2019-10-09 | CHARLOTTETOWN, PE | 1104.61 | 0.00 | 0.00 | 0.00 | 0.00 | 1104.61 | vac-acc | 2019 | low |
| 68503 | DEPUTY MINISTER | 2019-11-13 | CHARLOTTETOWN, PE | 726.86 | 0.00 | 0.00 | 0.00 | 0.00 | 726.86 | vac-acc | 2019 | low |
| 68505 | Minister of Veterans Affairs | 2019-05-03 | Ottawa- Halifax-Ottawa | 1128.27 | 0.00 | 0.00 | 182.50 | 0.00 | 1310.77 | vac-acc | 2019 | low |
| 68506 | Minister of Veterans Affairs | 2019-06-14 | Toronto, ON | 1104.29 | 0.00 | 0.00 | 70.75 | 0.00 | 1175.04 | vac-acc | 2019 | low |
| 68507 | Minister of Veterans Affairs | 2019-05-30 | France & Rome | 8008.22 | 0.00 | 0.00 | 644.06 | 0.00 | 8652.28 | vac-acc | 2019 | high |
4354 rows × 12 columns
top10_data_trc1 = alt.Chart(top10_new_col).mark_bar().encode(x=alt.X('count()', title="count"),
y=alt.Y('owner_org:N', title="Owner Organisation", sort='x'),
color=alt.Color('lev_expenses:N')).properties(title="Count of high & low travels ")
#top10_data_trc1
text = alt.Chart(top10_new_col).mark_text(dx=-15, dy=3, color='white').encode(x=alt.X('count()', stack='zero'),
y=alt.Y('owner_org:N', sort='x'),
detail = 'lev_expenses:N',
text=alt.Text('count()'))
top10_data_trc1 + text
Above graph gives an answer to the question, why dnd-mdn has highest travel expenses. Compare to all other nine organisations dnd-mdn has the highest number of travels and 30% of the travels categorized as high. Eventhough ic and dnd-mdn both have approximately equal numner travels but 10% of total travels represent as high in ic. Therefore it is better to analyse why the 30% of the dnd-mdn travels falls under high level of expenses
Lets filter total of expenses above 3000 in 2019
dnd_high_tr = year2019_f[(year2019_f['owner_org'] == 'dnd-mdn') & (year2019_f['total'] >= 3000)]
dnd_high_tr
| title_en | start_date | destination_en | airfare | other_transport | lodging | meals | other_expenses | total | owner_org | year | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 22198 | Commander CANSOFCOM | 2019-02-17 | Ouagadougou, Burkina Faso | 12777.02 | 106.18 | 840.28 | 357.35 | 0.00 | 14080.83 | dnd-mdn | 2019 |
| 22237 | Assistant Deputy Minister | 2019-03-23 | Riga, Latvia and Brussels Belgium | 9289.81 | 100.00 | 1189.79 | 739.86 | 170.42 | 11489.88 | dnd-mdn | 2019 |
| 22281 | ADM(Pol) | 2019-01-26 | The Hague, Netherlands | 8347.22 | 228.54 | 484.16 | 421.00 | 0.00 | 9481.14 | dnd-mdn | 2019 |
| 22283 | ADM(Pol) | 2019-02-11 | Brussels, Belgium\r\nMunich, Germany | 7807.00 | 207.16 | 627.35 | 499.00 | 0.00 | 9140.86 | dnd-mdn | 2019 |
| 22286 | DGIS Pol | 2019-01-21 | Hanoi, Vietnam\r\nSingapore | 7743.44 | 67.93 | 2657.05 | 1068.00 | 151.16 | 11687.61 | dnd-mdn | 2019 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 23358 | Commandant - NATO Defense College | 2019-09-26 | Washington, New York, Norfolk, Ottawa | 0.00 | 0.00 | 3464.35 | 1312.15 | 256.81 | 5033.31 | dnd-mdn | 2019 |
| 23359 | Commandant - NATO Defense College | 2019-11-01 | Ottawa (Ontario) Canada | 6010.30 | 0.00 | 470.08 | 393.58 | 0.00 | 6873.96 | dnd-mdn | 2019 |
| 23365 | Deputy Vice Chief of the Defence Staff | 2019-10-13 | Wuhan, China | 9681.64 | 19.79 | 0.00 | 818.75 | 0.00 | 10520.18 | dnd-mdn | 2019 |
| 23375 | Chief of Reserves and Employer Support (CRES) | 2019-11-17 | Ottawa, Canada | 1412.17 | 275.43 | 875.50 | 510.10 | 0.00 | 3073.20 | dnd-mdn | 2019 |
| 23414 | Chaplain General | 2019-09-29 | Brussels, France | 1777.30 | 80.50 | 436.43 | 672.41 | 615.49 | 3582.13 | dnd-mdn | 2019 |
247 rows × 11 columns
Lets see the impact of travel expenses to the total
dnd_hmelt = None
dnd_hmelt = dnd_high_tr.melt(
id_vars=["title_en", "start_date", "destination_en", "total","owner_org"],
value_vars=[
"airfare",
"other_transport",
"lodging",
"meals",
"other_expenses",
],
var_name="expenses",
value_name="value",
)
dnd_hmelt
| title_en | start_date | destination_en | total | owner_org | expenses | value | |
|---|---|---|---|---|---|---|---|
| 0 | Commander CANSOFCOM | 2019-02-17 | Ouagadougou, Burkina Faso | 14080.83 | dnd-mdn | airfare | 12777.02 |
| 1 | Assistant Deputy Minister | 2019-03-23 | Riga, Latvia and Brussels Belgium | 11489.88 | dnd-mdn | airfare | 9289.81 |
| 2 | ADM(Pol) | 2019-01-26 | The Hague, Netherlands | 9481.14 | dnd-mdn | airfare | 8347.22 |
| 3 | ADM(Pol) | 2019-02-11 | Brussels, Belgium\r\nMunich, Germany | 9140.86 | dnd-mdn | airfare | 7807.00 |
| 4 | DGIS Pol | 2019-01-21 | Hanoi, Vietnam\r\nSingapore | 11687.61 | dnd-mdn | airfare | 7743.44 |
| ... | ... | ... | ... | ... | ... | ... | ... |
| 1230 | Commandant - NATO Defense College | 2019-09-26 | Washington, New York, Norfolk, Ottawa | 5033.31 | dnd-mdn | other_expenses | 256.81 |
| 1231 | Commandant - NATO Defense College | 2019-11-01 | Ottawa (Ontario) Canada | 6873.96 | dnd-mdn | other_expenses | 0.00 |
| 1232 | Deputy Vice Chief of the Defence Staff | 2019-10-13 | Wuhan, China | 10520.18 | dnd-mdn | other_expenses | 0.00 |
| 1233 | Chief of Reserves and Employer Support (CRES) | 2019-11-17 | Ottawa, Canada | 3073.20 | dnd-mdn | other_expenses | 0.00 |
| 1234 | Chaplain General | 2019-09-29 | Brussels, France | 3582.13 | dnd-mdn | other_expenses | 615.49 |
1235 rows × 7 columns
dnd_mdnex = None
dnd_mdnex = alt.Chart(dnd_hmelt).mark_bar().encode(x=alt.X("total:Q", title="Total of expenses", bin=alt.Bin(maxbins=45)),
y=alt.Y("value:Q", title="Expenses"),
color=alt.Color("expenses")).properties(width=200, height=150).facet('expenses', title="Expenses vs Total cost")
dnd_mdnex
Above faceted plots indicates, the total of expenses mostly impacted by airfare
As travel expenses data represent in very large range for the analysis of titles only consider the total above and equal to 10000
dnd_mdnhst = dnd_hmelt[dnd_hmelt['total'] >=10000]
dnd_mdnhst
| title_en | start_date | destination_en | total | owner_org | expenses | value | |
|---|---|---|---|---|---|---|---|
| 0 | Commander CANSOFCOM | 2019-02-17 | Ouagadougou, Burkina Faso | 14080.83 | dnd-mdn | airfare | 12777.02 |
| 1 | Assistant Deputy Minister | 2019-03-23 | Riga, Latvia and Brussels Belgium | 11489.88 | dnd-mdn | airfare | 9289.81 |
| 4 | DGIS Pol | 2019-01-21 | Hanoi, Vietnam\r\nSingapore | 11687.61 | dnd-mdn | airfare | 7743.44 |
| 6 | Deputy Minister | 2019-02-11 | Brussels | 10475.63 | dnd-mdn | airfare | 8949.90 |
| 7 | Commander, Canadian Army | 2019-01-24 | Toronto, Kuwait | 13583.30 | dnd-mdn | airfare | 12543.44 |
| ... | ... | ... | ... | ... | ... | ... | ... |
| 1208 | Special Assistant - Western Desk | 2019-11-27 | East West Flight Passes | 15060.00 | dnd-mdn | other_expenses | 0.00 |
| 1211 | Deputy Commander, RCN | 2019-10-04 | Sydney, Australia | 13184.41 | dnd-mdn | other_expenses | 95.97 |
| 1226 | Commander CFINTCOM | 2019-10-14 | Sofia, Bulgaria; Riga Latvia | 12104.12 | dnd-mdn | other_expenses | 22.59 |
| 1228 | ADM(Pol) | 2019-12-15 | Kiev, Ukraine | 12392.14 | dnd-mdn | other_expenses | 0.00 |
| 1232 | Deputy Vice Chief of the Defence Staff | 2019-10-13 | Wuhan, China | 10520.18 | dnd-mdn | other_expenses | 0.00 |
385 rows × 7 columns
dnd_mdn_tit = alt.Chart(dnd_mdnhst).mark_bar().encode(x=alt.X('title_en', title="Title", sort='y'),
y=alt.Y('total', title="Total"),
color=alt.Color('expenses'),
tooltip='destination_en').properties(title="Titles responsible for expensive travels")
dnd_mdn_tit
When considering the total expenses veriation in titles, the above graph shows minister of national defense among the top. And the other all titles also related to the execative staff and all are overseas travel
allmdn_tr = new_colyear[new_colyear['owner_org'] == 'dnd-mdn']
allmdn_tr
| title_en | start_date | destination_en | airfare | other_transport | lodging | meals | other_expenses | total | owner_org | year | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 20773 | DOS SJS | 2016-10-24 | Washington - Dictrict of Columbia | 821.00 | 31.74 | 700.43 | 170.11 | 0.00 | 1723.28 | dnd-mdn | 2016 |
| 20774 | Commander of 1 Canadian Air Division/Canadian ... | 2016-09-26 | Tyndall Air Force Base, Florida, United State... | 507.36 | 0.00 | 432.86 | 302.82 | 0.00 | 1243.04 | dnd-mdn | 2016 |
| 20775 | Commander of 1 Canadian Air Division/Canadian ... | 2016-11-14 | Ottawa, Ontario, Canada | 487.06 | 148.37 | 740.24 | 405.70 | 94.92 | 1876.29 | dnd-mdn | 2016 |
| 20776 | Assistant Deputy Minister (Science & Technology) | 2016-09-17 | Rome, Italy and Sydney, Australia | 14351.47 | 216.66 | 2340.06 | 1734.75 | 0.00 | 18642.94 | dnd-mdn | 2016 |
| 20777 | Assistant Deputy Minister (Science & Technology) | 2016-10-26 | Washington, D.C. | 0.00 | 40.00 | 474.05 | 23.14 | 0.00 | 537.19 | dnd-mdn | 2016 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 23466 | Exempt Staff | 2020-04-01 | All Canada | 0.00 | 0.00 | 0.00 | 0.00 | 596.64 | 596.64 | dnd-mdn | 2020 |
| 23467 | Executive Assistant | 2020-08-15 | Ottawa (Ontario) Canada | 245.83 | 0.00 | 300.00 | 0.00 | 0.00 | 545.83 | dnd-mdn | 2020 |
| 23468 | Assistant Deputy Minster (Materiel) | 2020-12-01 | Ottawa (Ontario) Canada | 0.00 | 22.25 | 0.00 | 0.00 | 0.00 | 22.25 | dnd-mdn | 2020 |
| 23469 | Assistant Deputy Minster (Materiel) | 2021-01-13 | Ottawa (Ontario) Canada | 0.00 | 84.50 | 0.00 | 0.00 | 0.00 | 84.50 | dnd-mdn | 2021 |
| 23470 | Assistant Deputy Minster (Materiel) | 2021-02-04 | Ottawa (Ontario) Canada | 0.00 | 60.00 | 0.00 | 0.00 | 0.00 | 60.00 | dnd-mdn | 2021 |
2693 rows × 11 columns
allmdn_trp = alt.Chart(allmdn_tr).mark_bar().encode(x=alt.X('start_date', title="Start date of travels"),
y=alt.Y('count():Q', title = "Number of Travels"),
color=alt.Color('year:N')).properties(title='Travel history of dnd-mdn')
allmdn_trp
This indicates dnd-mdn has started from year 2015 and gradulally increase its number of travels and reduced in 2020 onwards
Lets concentrate the travel cost above 3000
tr_ex2016 = new_colyear[(new_colyear['year'] > 2015) & (new_colyear['year'] < 2019)]
tr_ex2016
| title_en | start_date | destination_en | airfare | other_transport | lodging | meals | other_expenses | total | owner_org | year | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | Assistant Deputy Minister, Market and Services... | 2017-02-24 | Hanoi, VN & New Delhi, IN | 13555.46 | 150.00 | 3451.37 | 1946.63 | 0.00 | 19103.46 | aafc-aac | 2017 |
| 1 | Assistant Deputy Minister, Market and Services... | 2017-05-11 | Winnipeg, Man. | 1090.03 | 72.64 | 147.81 | 161.70 | 0.00 | 1472.18 | aafc-aac | 2017 |
| 2 | Assistant Deputy Minister, Market and Services... | 2017-05-20 | Charlottetown, P.E.I. | 426.29 | 320.59 | 200.18 | 97.70 | 0.00 | 1044.76 | aafc-aac | 2017 |
| 3 | Assistant Deputy Minister, Science and Technol... | 2017-03-21 | Winnipeg, MB | 670.69 | 136.27 | 461.19 | 342.85 | 0.00 | 1611.00 | aafc-aac | 2017 |
| 4 | Assistant Deputy Minister, Science and Technol... | 2017-04-11 | London, ON | 507.55 | 83.18 | 157.07 | 115.75 | 0.00 | 863.55 | aafc-aac | 2017 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 70218 | Assistant Deputy Minister (Alberta) | 2018-12-03 | Edmonton, AB | 0.00 | 72.18 | 0.00 | 0.00 | 5.42 | 77.60 | wd-deo | 2018 |
| 70219 | Assistant Deputy Minister (Saskatchewan) | 2018-12-04 | Saskatoon, SK | 0.00 | 145.86 | 0.00 | 0.00 | 8.75 | 154.61 | wd-deo | 2018 |
| 70220 | Assistant Deputy Minister (British Columbia) | 2018-05-07 | Ottawa, ON; Winnipeg, MB | 1111.92 | 147.47 | 701.58 | 310.40 | 12.43 | 2306.38 | wd-deo | 2018 |
| 70221 | Deputy Minister | 2018-02-14 | Vancouver, BC | 560.12 | 202.32 | 467.66 | 241.50 | 12.43 | 1484.03 | wd-deo | 2018 |
| 70222 | Assistant Deputy Minister (Saskatchewan) | 2017-10-15 | Calgary, AB | 725.33 | 122.98 | 50.00 | 148.40 | 12.43 | 1059.14 | wd-deo | 2017 |
23341 rows × 11 columns
htr_ex2016 = tr_ex2016[tr_ex2016['total'] >= 3000]
htr_ex2016
| title_en | start_date | destination_en | airfare | other_transport | lodging | meals | other_expenses | total | owner_org | year | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | Assistant Deputy Minister, Market and Services... | 2017-02-24 | Hanoi, VN & New Delhi, IN | 13555.46 | 150.00 | 3451.37 | 1946.63 | 0.00 | 19103.46 | aafc-aac | 2017 |
| 7 | Chief Information Officer, Information Service... | 2017-02-24 | Phoenix, AZ | 896.88 | 233.26 | 1607.34 | 457.95 | 0.00 | 3195.43 | aafc-aac | 2017 |
| 17 | Assistant Deputy Minister, Strategic Policy Br... | 2017-03-17 | Boston, MA & Guelph, Ont. | 809.66 | 255.32 | 1597.84 | 570.93 | 0.00 | 3233.75 | aafc-aac | 2017 |
| 18 | Assistant Deputy Minister, Strategic Policy Br... | 2017-04-18 | Edmonton, Alta. & Toronto, Ont. | 2139.08 | 132.00 | 456.34 | 260.15 | 113.00 | 3100.57 | aafc-aac | 2017 |
| 24 | Associate Assistant Deputy Minister, Science a... | 2017-05-14 | London, UL & Rothamsted, UK | 7560.54 | 385.92 | 951.15 | 733.66 | 2.95 | 9634.22 | aafc-aac | 2017 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 70133 | Deputy Minister | 2018-11-19 | Ottawa, ON | 3602.55 | 260.18 | 644.00 | 251.95 | 12.43 | 4771.11 | wd-deo | 2018 |
| 70137 | Assistant Deputy Minister (Saskatchewan) | 2018-10-20 | Paris, Amiens, Lille, France; Antwerp, Leuven,... | 1962.02 | 1693.00 | 4036.69 | 2445.35 | 12.43 | 10149.49 | wd-deo | 2018 |
| 70164 | Deputy Minister | 2018-12-10 | Ottawa, ON; Vancouver, BC | 3642.35 | 268.49 | 1125.78 | 416.75 | 12.43 | 5465.80 | wd-deo | 2018 |
| 70183 | Deputy Minister | 2018-04-22 | Calgary, AB; Vancouver, BC | 1410.41 | 270.91 | 1008.46 | 395.30 | 12.43 | 3097.51 | wd-deo | 2018 |
| 70184 | Deputy Minister | 2018-09-13 | Winnipeg, MB; Buenos Aires, Argentina | 6475.35 | 247.98 | 1994.47 | 724.06 | 12.43 | 9454.29 | wd-deo | 2018 |
3499 rows × 11 columns
Lets melt the above dataframe to findout top 10 owner organisations
alt.data_transformers.disable_max_rows()
hiow_org_melt = None
hiow_org_melt = htr_ex2016.melt(
id_vars=["title_en", "start_date", "destination_en", "total","owner_org","year"],
value_vars=[
"airfare",
"other_transport",
"lodging",
"meals",
"other_expenses",
],
var_name="expenses",
value_name="value")
#title_names = sorted(tr_exall["owner_org"].unique())
#alow_orgc = alt.Chart(title_names).mark_bar().encode(x=alt.X('year:N', title="Count of travels"),
#y=alt.Y('owner_org:N', title="Owner Organisation", sort='x'))
hiow_org_melt
| title_en | start_date | destination_en | total | owner_org | year | expenses | value | |
|---|---|---|---|---|---|---|---|---|
| 0 | Assistant Deputy Minister, Market and Services... | 2017-02-24 | Hanoi, VN & New Delhi, IN | 19103.46 | aafc-aac | 2017 | airfare | 13555.46 |
| 1 | Chief Information Officer, Information Service... | 2017-02-24 | Phoenix, AZ | 3195.43 | aafc-aac | 2017 | airfare | 896.88 |
| 2 | Assistant Deputy Minister, Strategic Policy Br... | 2017-03-17 | Boston, MA & Guelph, Ont. | 3233.75 | aafc-aac | 2017 | airfare | 809.66 |
| 3 | Assistant Deputy Minister, Strategic Policy Br... | 2017-04-18 | Edmonton, Alta. & Toronto, Ont. | 3100.57 | aafc-aac | 2017 | airfare | 2139.08 |
| 4 | Associate Assistant Deputy Minister, Science a... | 2017-05-14 | London, UL & Rothamsted, UK | 9634.22 | aafc-aac | 2017 | airfare | 7560.54 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 17490 | Deputy Minister | 2018-11-19 | Ottawa, ON | 4771.11 | wd-deo | 2018 | other_expenses | 12.43 |
| 17491 | Assistant Deputy Minister (Saskatchewan) | 2018-10-20 | Paris, Amiens, Lille, France; Antwerp, Leuven,... | 10149.49 | wd-deo | 2018 | other_expenses | 12.43 |
| 17492 | Deputy Minister | 2018-12-10 | Ottawa, ON; Vancouver, BC | 5465.80 | wd-deo | 2018 | other_expenses | 12.43 |
| 17493 | Deputy Minister | 2018-04-22 | Calgary, AB; Vancouver, BC | 3097.51 | wd-deo | 2018 | other_expenses | 12.43 |
| 17494 | Deputy Minister | 2018-09-13 | Winnipeg, MB; Buenos Aires, Argentina | 9454.29 | wd-deo | 2018 | other_expenses | 12.43 |
17495 rows × 8 columns
hiow_org_gr = hiow_org_melt.groupby(by='owner_org').agg({'total':['sum']})
hiow_org_gr.columns = ["_".join(col).rstrip("_") for col in hiow_org_gr.columns.values]
hiow_org_gr = hiow_org_gr.reset_index()
hiow_org_gr
| owner_org | total_sum | |
|---|---|---|
| 0 | aafc-aac | 4038557.30 |
| 1 | aandc-aadnc | 1578087.25 |
| 2 | acoa-apeca | 1021884.20 |
| 3 | aecl-eacl | 809463.05 |
| 4 | atssc-scdata | 890841.70 |
| ... | ... | ... |
| 69 | tc | 2944742.25 |
| 70 | tsb-bst | 19666.90 |
| 71 | vac-acc | 3409726.85 |
| 72 | vrab-tacra | 117031.10 |
| 73 | wd-deo | 1425802.10 |
74 rows × 2 columns
hiow_org_top10 = None
hiow_org_top10 = hiow_org_gr.nlargest(10, "total_sum")
hiow_org_top10
| owner_org | total_sum | |
|---|---|---|
| 25 | dnd-mdn | 1.686179e+07 |
| 31 | fin | 1.001316e+07 |
| 29 | fcac-acfc | 6.942717e+06 |
| 14 | cic | 6.232095e+06 |
| 26 | ec | 6.099791e+06 |
| 45 | nrcan-rncan | 5.554599e+06 |
| 28 | esdc-edsc | 4.579573e+06 |
| 24 | dfo-mpo | 4.519547e+06 |
| 0 | aafc-aac | 4.038557e+06 |
| 55 | pch | 3.607127e+06 |
Lets filter the associated data to the top 10 owner organisations and findout the counts of travels by counting entries of the owner organisation because each entry equal to travel
hiow_og10 = None
hiow_og10 = tr_ex2016[tr_ex2016['owner_org'].isin([ 'dnd-mdn','fin','fcac-acfc','cic','ec', 'esdc-edsc', 'nrcan-rncan', 'dfo-mpo','aafc-aac','pch'])]
hiow_og10
| title_en | start_date | destination_en | airfare | other_transport | lodging | meals | other_expenses | total | owner_org | year | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | Assistant Deputy Minister, Market and Services... | 2017-02-24 | Hanoi, VN & New Delhi, IN | 13555.46 | 150.00 | 3451.37 | 1946.63 | 0.0 | 19103.46 | aafc-aac | 2017 |
| 1 | Assistant Deputy Minister, Market and Services... | 2017-05-11 | Winnipeg, Man. | 1090.03 | 72.64 | 147.81 | 161.70 | 0.0 | 1472.18 | aafc-aac | 2017 |
| 2 | Assistant Deputy Minister, Market and Services... | 2017-05-20 | Charlottetown, P.E.I. | 426.29 | 320.59 | 200.18 | 97.70 | 0.0 | 1044.76 | aafc-aac | 2017 |
| 3 | Assistant Deputy Minister, Science and Technol... | 2017-03-21 | Winnipeg, MB | 670.69 | 136.27 | 461.19 | 342.85 | 0.0 | 1611.00 | aafc-aac | 2017 |
| 4 | Assistant Deputy Minister, Science and Technol... | 2017-04-11 | London, ON | 507.55 | 83.18 | 157.07 | 115.75 | 0.0 | 863.55 | aafc-aac | 2017 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 59469 | Deputy Minister | 2018-11-09 | Montreal, Quebec, Canada | 0.00 | 52.00 | 236.82 | 104.85 | 0.0 | 393.67 | pch | 2018 |
| 59470 | Deputy Minister | 2018-11-13 | Gatineau, Quebec, Canada | 0.00 | 47.85 | 0.00 | 0.00 | 0.0 | 47.85 | pch | 2018 |
| 59610 | Director of Parliamentary Affairs | 2018-08-06 | Toronto, (Ontario) | 564.07 | 0.00 | 0.00 | 0.00 | 0.0 | 564.07 | pch | 2018 |
| 59701 | Chief of Staff | 2018-12-13 | Vancouver, (British Columbia) | 201.20 | 0.00 | 0.00 | 0.00 | 0.0 | 0.00 | pch | 2018 |
| 59702 | Minister of Canadian Heritage and Multicultura... | 2018-12-13 | Vancouver, (British Columbia) | 201.20 | 0.00 | 0.00 | 0.00 | 0.0 | 0.00 | pch | 2018 |
9473 rows × 11 columns
hiow_top10_tr = None
hiow_top10_tr = alt.Chart(hiow_og10).mark_bar().encode(x=alt.X('count()', title="Count of travels"),
y=alt.Y('owner_org', title="Owner organisation", sort='x'),
color=alt.Color('year:N')).properties(width=500, height=250)
text = alt.Chart(hiow_og10).mark_text(dx=-15, dy=3, color='white').encode(x=alt.X('count()', stack='zero'),
y=alt.Y('owner_org', sort='x'),
detail = 'year:N',
text=alt.Text('count()')).properties(title='Count of travels')
hiow_top10_tr + text
Above counts of travel indicate the esdc-edsc has the highest number of travels form 2016 to 2018
It is not necessarily to be high in total expenses when count of travels are high because if all the travels are related within the Canada, htere is a possibility to result lower total expenses. Therefore for further conformation lets see how the total expenses vary among the top owner organisations and lets see esdc-edsc has the highest total expenses.
tr_ex2016_2019 = None
tr_ex2016_2019hi = None
tr_ex2016_2019 = new_colyear[(new_colyear['year'] > 2015)&(new_colyear['year'] < 2020)]
tr_ex2016_2019
| title_en | start_date | destination_en | airfare | other_transport | lodging | meals | other_expenses | total | owner_org | year | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | Assistant Deputy Minister, Market and Services... | 2017-02-24 | Hanoi, VN & New Delhi, IN | 13555.46 | 150.00 | 3451.37 | 1946.63 | 0.00 | 19103.46 | aafc-aac | 2017 |
| 1 | Assistant Deputy Minister, Market and Services... | 2017-05-11 | Winnipeg, Man. | 1090.03 | 72.64 | 147.81 | 161.70 | 0.00 | 1472.18 | aafc-aac | 2017 |
| 2 | Assistant Deputy Minister, Market and Services... | 2017-05-20 | Charlottetown, P.E.I. | 426.29 | 320.59 | 200.18 | 97.70 | 0.00 | 1044.76 | aafc-aac | 2017 |
| 3 | Assistant Deputy Minister, Science and Technol... | 2017-03-21 | Winnipeg, MB | 670.69 | 136.27 | 461.19 | 342.85 | 0.00 | 1611.00 | aafc-aac | 2017 |
| 4 | Assistant Deputy Minister, Science and Technol... | 2017-04-11 | London, ON | 507.55 | 83.18 | 157.07 | 115.75 | 0.00 | 863.55 | aafc-aac | 2017 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 70298 | Assistant Deputy Minister (Alberta) | 2019-12-16 | Calgary, AB | 0.00 | 136.04 | 223.42 | 75.55 | 12.43 | 447.44 | wd-deo | 2019 |
| 70299 | Assistant Deputy Minister (British Columbia) | 2019-12-16 | Edmonton, AB | 92.40 | 180.57 | 357.02 | 194.20 | 12.32 | 836.51 | wd-deo | 2019 |
| 70300 | Assistant Deputy Minister, Policy and Strategi... | 2019-12-16 | Edmonton, AB | 1055.51 | 137.26 | 786.80 | 371.10 | 12.43 | 2363.10 | wd-deo | 2019 |
| 70305 | Assistant Deputy Minister (Saskatchewan) | 2019-11-14 | Saskatoon, Saskatchewan | 0.00 | 114.50 | 0.00 | 0.00 | 7.33 | 121.83 | wd-deo | 2019 |
| 70316 | Assistant Deputy Minister (British Columbia) | 2019-12-09 | Vancouver, British Columbia | 0.00 | 28.35 | 0.00 | 0.00 | 0.00 | 28.35 | wd-deo | 2019 |
34298 rows × 11 columns
Here there is an exception has done because when compare the top 10 organisations in 2019 and the top ten in the period of 2016 to 2018, all the organisations are same except the organisation ic, prominet in 2019. Therefore for the top10 organisations from the year2016 to 2019 the owner organisation ic also has been included for the total expenses analysis
tr_ex2016_2019hi = tr_ex2016_2019[tr_ex2016_2019['owner_org'].isin([ 'dnd-mdn','fin','fcac-acfc','cic','ec', 'esdc-edsc', 'nrcan-rncan', 'dfo-mpo','aafc-aac','pch','ic'])]
tr_ex2016_2019hi
| title_en | start_date | destination_en | airfare | other_transport | lodging | meals | other_expenses | total | owner_org | year | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | Assistant Deputy Minister, Market and Services... | 2017-02-24 | Hanoi, VN & New Delhi, IN | 13555.46 | 150.00 | 3451.37 | 1946.63 | 0.0 | 19103.46 | aafc-aac | 2017 |
| 1 | Assistant Deputy Minister, Market and Services... | 2017-05-11 | Winnipeg, Man. | 1090.03 | 72.64 | 147.81 | 161.70 | 0.0 | 1472.18 | aafc-aac | 2017 |
| 2 | Assistant Deputy Minister, Market and Services... | 2017-05-20 | Charlottetown, P.E.I. | 426.29 | 320.59 | 200.18 | 97.70 | 0.0 | 1044.76 | aafc-aac | 2017 |
| 3 | Assistant Deputy Minister, Science and Technol... | 2017-03-21 | Winnipeg, MB | 670.69 | 136.27 | 461.19 | 342.85 | 0.0 | 1611.00 | aafc-aac | 2017 |
| 4 | Assistant Deputy Minister, Science and Technol... | 2017-04-11 | London, ON | 507.55 | 83.18 | 157.07 | 115.75 | 0.0 | 863.55 | aafc-aac | 2017 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 59767 | Press Secretary | 2019-12-19 | Montreal, (Québec) | 0.00 | 108.97 | 0.00 | 91.50 | 0.0 | 200.47 | pch | 2019 |
| 59783 | Chief Staff | 2019-09-08 | Toronto, (Ontario) | 922.73 | 0.00 | 0.00 | 0.00 | 0.0 | 923.73 | pch | 2019 |
| 59789 | Senior Assistant Deputy Minister, Cultural Aff... | 2019-10-13 | Frankfurt, (Germany) | 2606.80 | 191.80 | 2974.77 | 1794.43 | 0.0 | 7568.00 | pch | 2019 |
| 59790 | Assistant Deputy Minister, Sport, Major Events... | 2019-12-02 | Montreal, (Quebec) | 0.00 | 81.77 | 0.00 | 40.95 | 15.0 | 137.71 | pch | 2019 |
| 59793 | Policy Advisor | 2019-05-08 | Rouyn-Noranda / Abitibi / Val d'Or, (Quebec) | 343.55 | 0.00 | 0.00 | 0.00 | 0.0 | 343.55 | pch | 2019 |
14100 rows × 11 columns
tr_ex2016_2019den = None
tr_ex2016_2019den = alt.Chart(tr_ex2016_2019hi).mark_bar().encode(x=alt.X('total', title="total of expnses"),
y=alt.Y('owner_org', title='Owner_organisation',stack="normalize", sort='x'),
color=alt.Color('year:N')).properties(title='Total expenses veriation', width=500, height=250)
#tr_exyear = sorted(tr_ex2016_2019hi['year'].unique())
#radiobuttons_year= alt.binding_radio(name='Year', options=tr_exyear)
#select_tr_exyear = alt.selection_single(
#fields=['year:N'],
#bind={'year:N':tr_exyear})
#tr_ex2016_2019den.add_selection(tr_exyear).encode(opacity=alt.condition(select_tr_exyear, alt.value(0.7), alt.value(0.05)))
tr_ex2016_2019den
presentation_tr_ex = None
presentation_tr_ex_title = alt.TitleParams(text='Annual National Defence Travel Expenses', subtitle="Ministry official's trvel expenses, National defence came first", anchor='middle')
presentation_tr_ex = (ow_org_ex1 | high_total_sc).properties(title= 'Comparison of expensive travels among owner organisations') & (hiow_top10_tr + text | tr_ex2016_2019den)
presentation_tr_ex = presentation_tr_ex.properties(title=presentation_tr_ex_title)
presentation_tr_ex
My aim was to findout the reason for the sudden increase of the travel expenses from 2016 to 2019. When comparing the three years, 2019 travel expenses was so high. Therefore, I first concentrated the data associated to the year 2019. From the first step, "Relationship between organisation travels," identified most of the travel expenses less than 5000. From the plot, "Veriation of total Expenses," identified dnd-mdn was the main contributer for the travel expenses. When compared with the plot, " Expenses above 4000, " it shows dnd-mdn has considerable amount of expensive travels.
The second step quantitatively analysed the travels among owner organisations for the questiona: Why dnd-mdn has the highest expenses? The top 10 organisations for travel expenses have been subjected for the analysis. When comparing the count of travels dnd-mdn shows highest count but there is no significant difference compared to ic. After categorising the travels which cost 4000 above as "high" and below as "low", dnd-mdn shows highest number of travels in the high category compared to all top 10 organisations. The analysis confirmed dnd-mdn is the main contributer for the travel increase in the year 2019.
Next, the curiosity was extended to see why the dnd-mdn has expensive travels compare to the other organisations. In this step, considered expenses associated for the total cost. The faceted bar plots clearly shows airfare was the main contributer to the total expenses. Next another question came up, whose responsible for these travels, for this titles were considered, those who spent more than 10000 for thier travels. Minister of National Defense came first in the plot "Titles responsible for expensive travels" and other titles represents the executive category and all these travels associated with overseas travels, which makes airfare appear the most impacted by the total cost of travel in 2019.
The story of travel expenses is not yet completed because exploratory data analysis shows a period of time from year 2016 to 2019 where a sudden increase of travel expenses occured. Next, data from 2016 to 2019 were considered as top ten organisations that have been identified and obtained counts for travels in each organisation yearly. The esdc-edsc shows the highest count for travels in the plot "Count of travels".Reason for this is dnd-mdm has started his travels according to the data from 2015 and it has increased gradually. But when compared with the total of expenses dnd-mdn is the highest and it is more than 50% higher than any of other organisations. According to this data, its confirmed that the dnd-mdn travels main contributer to the travel expenses increased in the peried of 2016 to 2019.
It is a great work that has been done by the ministry officials disclosing their travel expenses to reflect the transperancy. During the analysis, a question came up to my mind: Why the dnd-mdn disclose their travel expenses since year 2015? Does that mean no travels would have occured prior to 2015?
According to the data which has been disclosed, dnd-mdn show gradual increase of travel expenses from 2015 to 2019 and a decrease in 2020 due to the pendemic travel band. Even though year by year shows tremendous increase of travel expenses it is not possible to come to a conclusion due to many reasons: entry errors - We have noticed few data points have been entered as negative values which is impossible to be negative, had to drop such data points,there could be entry errors which we have not noticed. Missing information also could have been possible among organisations.
Therefore, it is essential to improve the quality of data by avoiding the free text columns as much as possible. For example, if title and destination column would be replaced by the dropdown menu instead of free text we can improved the quality of data in great extent.
Therefore all the conclusions are made above based on the data disclosed and it is necessary to do more reseach, analysis, and improve on how to do proper conclusions.
completed by: Pamodya Siriwardana Date: 2021-05-22